NASM Examples

Getting Started

Here is a very short NASM program that displays "Hello, World" on a line then exits. Like most programs on this page, you link it with a C library:

asm/nasm/Win32/helloworld.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19
; ----------------------------------------------------------------------------
; helloworld.asm
;
; This is a Win32 console program that writes "Hello, World" on one line and
; then exits.  It needs to be linked with a C library.
; ----------------------------------------------------------------------------

	global	_main
	extern	_printf
	
	section .text
_main:
	push	dword message
	call	_printf
	add	esp, 4
	ret
message:
	db	'Hello, World', 10, 0
	

To assemble, link and run this program under Windows:

    nasm -fwin32 helloworld.asm
    gcc helloworld.obj
    a

Under Linux, you'll need to remove the leading underscores from function names, and execute

    nasm -felf helloworld.asm
    gcc helloworld.o
    ./a.out

Understanding Calling Conventions

If you are writing assembly language functions that will link with C, and you're using gcc, you must obey the gcc calling conventions. These are:

This program prints the first few fibonacci numbers, illustrating how registers have to be saved and restored:

asm/nasm/Win32/fib.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45
; ----------------------------------------------------------------------------
; fib.asm
;
; This is a Win32 console program that writes the first 40 Fibonacci numbers.
; It needs to be linked with a C library.
; ----------------------------------------------------------------------------

	global	_main
	extern	_printf
	
	section .text
_main:
	push	ebx			; we have to save this since we use it
	
	mov	ecx, 40			; ecx will countdown from 40 to 0
	xor	eax, eax		; eax will hold the current number
	xor	ebx, ebx		; ebx will hold the next number
	inc	ebx			; ebx is originally 1
print:
	; We need to call printf, but we are using eax, ebx, and ecx.  printf
	; may destroy eax and ecx so we will save these before the call and
	; restore them afterwards.
	
	push    eax
	push	ecx
	
	push	eax
	push	dword format
	call	_printf
	add	esp, 8
	
	pop	ecx
	pop	eax
	
	mov	edx, eax		; save the current number
	mov	eax, ebx		; next number is now current
	add	ebx, edx		; get the new next number
	dec	ecx			; count down
	jnz	print			; if not done counting, do some more
	
	pop	ebx			; restore ebx before returning
	ret
format:
	db	'%10d', 0
	

Mixing C and Assembly Langauge

This program is just a simple function that takes in three integer parameters and returns the maximum value. It shows that the parameters will be at [esp+4], [esp+8] and [esp+12], and that the value gets returned in eax.

asm/nasm/Win32/maxofthree.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25
; ----------------------------------------------------------------------------
; maxofthree.asm
;
; NASM implementation of a function that returns the maximum value of its
; three integer parameters.  The function has prototype:
;
;   int maxofthree(int x, int y, int z)
;
; Note that only eax, ecx, and edx were used so no registers had to be saved
; and restored.
; ----------------------------------------------------------------------------	

	global	_maxofthree
	
	section .text
_maxofthree:
	mov	eax, [esp+4]
	mov	ecx, [esp+8]
	mov	edx, [esp+12]
	cmp	eax, ecx
	cmovl	eax, ecx
	cmp	eax, edx
	cmovl	eax, edx
	ret

Here is a C program that calls the assembly language function.

asm/nasm/Win32/callmaxofthree.c
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21
/*
 * callmaxofthree.c
 *
 * Illustrates how to call the maxofthree function we wrote in assembly
 * language.
 */

#include <stdio.h>

int maxofthree(int, int, int);

int main() {
    printf("%d\n", maxofthree(1, -4, -7));
    printf("%d\n", maxofthree(2, -6, 1));
    printf("%d\n", maxofthree(2, 3, 1));
    printf("%d\n", maxofthree(-2, 4, 3));
    printf("%d\n", maxofthree(2, -6, 5));
    printf("%d\n", maxofthree(2, 4, 6));
    return 0;
}

To assemble, link and run this two-part program (on Windows):

    nasm -fwin32 maxofthree.asm
    gcc callmaxofthree.c maxofthree.obj
    a

Command Line Arguments

You know that in C, main is just a plain old function, and it has a couple parameters of its own:

    int main(int argc, char** argv)

Here is a program that uses this fact to simply echo the commandline arguments to a program, one per line:

asm/nasm/Win32/echo.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34
; ----------------------------------------------------------------------------
; echo.asm
;
; NASM implementation of a program that displays its commandline arguments,
; one per line.
; ----------------------------------------------------------------------------	

	global	_main
	extern	_printf
	
	section .text
_main:
	mov	ecx, [esp+4]	        ; argc
	mov	edx, [esp+8]		; argv
top:
	push	ecx			; save registers that printf wastes
	push	edx
	
	push	dword [edx]		; the argument string to display
	push	dword format		; the format string	
	call	_printf
	add	esp, 8			; remove the two parameters
	
	pop	edx			; restore registers printf used
	pop	ecx
	
	add	edx, 4			; point to next argument
	dec	ecx			; count down
	jnz	top			; if not done counting keep going

	ret
format:
	db	'%s', 10, 0

Note that as far as the C Library is concerned, command line arguments are always strings. If you want to treat them as integers, call atoi. Here's a neat program to compute xy.

asm/nasm/Win32/power.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68
; ----------------------------------------------------------------------------
; power.asm
;
; Command line application to compute x^y
; Syntax: power x y
; x and y are integers
; ----------------------------------------------------------------------------

	global	_main
	extern	_atoi
	extern	_printf

	section	.text
_main:
	push	ebx			; save the registers that must be saved
	push	esi
	push	edi
	
	mov	eax, [esp+16]		; argc (it's not at [esp+4] now :-))
	cmp	eax, 3			; must have exactly two arguments
	jne	error1
	
	mov	ebx, [esp+20]		; argv
	push	dword [ebx+4]		; argv[1]
	call	_atoi
	add	esp, 4
	mov	esi, eax		; x in esi
	push	dword [ebx+8]
	call	_atoi			; argv[2]
	add	esp, 4
	cmp	eax, 0
	jl	error2
	mov	edi, eax		; y in edi

	mov	eax, 1			; start with answer = 1
check:
	test	edi, edi		; we're counting y downto 0
	jz      gotit			; done
	imul	eax, esi		; multiply in another x
	dec	edi
	jmp	check
gotit:					; print report on success
	push    eax
	push    dword answer
	call    _printf
	add     esp, 8
	jmp	done
error1:					; print error message
	push	dword badArgumentCount
	call	_printf
	add	esp, 4
	jmp	done
error2:					; print error message
	push	dword negativeExponent
	call	_printf
	add	esp, 4
done:					; restore saved registers
	pop	edi
	pop	esi
	pop	ebx
	ret
       
answer:
	db      '%d', 10, 0
badArgumentCount:
	db	'Requires exactly two arguments', 10, 0       
negativeExponent:
	db	'The exponent may not be negative', 10, 0

Floating Point Instructions

Here is an example that uses only two floating point instructions, fldz and fadd.

asm/nasm/Win32/sum.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26
; ----------------------------------------------------------------------------
; sum.asm
;
; NASM implementation of a function that returns the sum of all the elements
; in a floating-point array.  The function has prototype:
;
;   double sum(double[] array, int length)
; ----------------------------------------------------------------------------	

	global	_sum
	
	section .text
_sum:
	mov	edx, [esp+4]		; address of argument
	mov	ecx, [esp+8]		; length of array
	fldz                            ; initialize the sum to 0
	cmp	ecx, 0			; guard against non-positive lengths!
	jle	done
next:
	fadd	qword [edx]             ; add in the current array element
	add	edx, 8                  ; move to next array element
	dec	ecx                     ; count down
	jnz	next			; if not done counting, continue
done:
	ret              		; return value already in st0

Data Sections

The text section is read-only on most operating systems, so you might find the need for a data section. On most operating systems, the data section is only for initialized data, and you have a special .bss section for uninitialized data. Here is a program that averages the command line arguments, expected to be integers, and displays the result as a floating point number. Note that there is no instruction to push an 8-byte value, so we fake it by manipulating esp.

asm/nasm/Win32/average.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54
; ----------------------------------------------------------------------------
; average.asm
;
; NASM implementation of a program that treats all its command line arguments
; as integers, as displays their average as a floating point number.  This
; program uses a data section to store intermediate results, not that it has
; to, but only to illustrate how data sections are used.
; ----------------------------------------------------------------------------	

	global	_main
	extern	_printf
	extern	_atoi
	
	section .text
_main:
	mov	ecx, [esp+4]		; argc
	dec	ecx			; don't count program name
	jz	nothingToAverage
	mov	[count], ecx		; save number of real arguments
	mov	edx, [esp+8]		; argv
accumulate:
	push	ecx			; save values across call to atoi
	push	edx
	push	dword [edx+ecx*4]	; argv[ecx]
	call	_atoi			; now eax has the int value of arg
	add	esp, 4
	pop	edx			; restore registers after atoi call
	pop	ecx
	add	[sum], eax		; accumulate sum as we go
	dec	ecx
	jnz	accumulate		; more arguments?
average:
	fild	dword [sum]
	fild	dword [count]
	fdivp	st1, st0		; sum / count
	sub	esp, 8			; make room for quotient on stack
	fstp	qword [esp]		; "push" quotient
	push	dword format		; push format string
	call	_printf
	add	esp, 12			; 4 bytes format, 8 bytes number
	ret
	
nothingToAverage:
	push	dword error
	call	_printf
	add	esp, 4
	ret
	
	section	.data
count:	dd	0
sum:	dd	0
format:	db	'%.15f', 10, 0
error:	db	'There are no command line arguments to average', 10, 0

Recursion

Perhaps surprisingly, there's nothing out of the ordinary required to implement recursive functions. You push parameters on the stack, after all! Here's an example. In C

    int factorial(int n) {
        return (n <= 1) ? 1 : n * factorial(n-1);
    }

In assembly language:

asm/nasm/Win32/factorial.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24
; ----------------------------------------------------------------------------
; factorial.asm
;
; Illustration of a recursive function.
; ----------------------------------------------------------------------------	

	global _factorial
	
	section .text
_factorial:
        mov     eax, [esp+4]		; n
        cmp	eax, 1			; n <= 1
        jnle	L1			; if not, go do a recursive call
        mov	eax, 1			; otherwise return 1
        jmp	L2
L1:
	dec	eax			; n-1
	push	eax			; push argument
	call	_factorial		; do the call, result goes in eax
	add	esp, 4			; get rid of argument
	imul	eax, [esp+4]		; n * factorial(n-1)
L2:
	ret
	

SIMD Parallelism

The 64-bit MMX registers can do eight byte operations in parallel, or four (16-bit) word operations in parallel, or two (32-bit) doubleword operations in parallel. The 128-bit XMMs can do 16 byte, 8 word, or 4 doubleword operations in parallel, and do parallel floating-point computations too (4 single precision or 2 double precision). Here is a simple function that sums two arrays of 16-bit short ints, four at a time:

asm/nasm/Win32/mmxarrayadd.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45
; ----------------------------------------------------------------------------
; mmxarrayadd.asm
;
; NASM implementation of a function that adds two short arrays.
;
;   void add(short a[], short b[], int n)
; ----------------------------------------------------------------------------	

	global	_add
	
	section .text
_add:
	push	ebx			; callee save register
	
	mov	eax, [esp+8]		; eax points to a
	mov	edx, [esp+12]		; edx points to b
	mov	ecx, [esp+16]		; ecx <- number of items in each array
	or	ecx, ecx		; guard against negative lengths
	js	L4
L1:	
	cmp	ecx, 4			; Less than 4 items left?
	jl	L2			; if so, handle them individually
	movq	mm0, qword [eax]	; Get four items from a
	paddw	mm0, qword [edx]	; Add them with next four items from b
	movq	qword [eax], mm0	; Write them back to a
	add	eax, 8			; Advance a to point to next 4 words
	add	edx, 8			; Advance b to point to next 4 words
	sub	ecx, 4			; We've just handled four
	jmp	L1
L2:
	jecxz	L4			; Are there zero items left?
L3:
	mov	bx, word [eax]		; One word at a time addition
	add	bx, word [edx]
	mov	word [eax], bx
	inc	eax
	inc	eax
	inc	edx
	inc	edx
	dec	ecx
	jnz	L3
L4:	
	pop	ebx
	ret

Here's another one

asm/nasm/Win32/sseexample.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65
; ----------------------------------------------------------------------------
; sseexample.asm
;
; This program demonstrates a few SSE instructions, for no particular reason
; other than to show them off.
; ----------------------------------------------------------------------------

	extern	_printf
	global	_main
	
	section	.text
_main:
	push	esi			; callee save register

; Illustrate packed square root computations	
	movups	xmm3, [x]
	sqrtps	xmm0, xmm3
	movups	[y], xmm0
	call	printall

; Illustrate packed maximums
	movups	xmm2, [x]
	movups	xmm5, [z]
	maxps	xmm2, xmm5
	movups	[y], xmm2
	call	printall

; Done
	pop	esi
	ret

printall:	
	mov	esi, 4
printone:
	; Note printf will NOT ACCEPT single precision floats.
	; We have to convert them to double precision floats. Sigh.
	fld	dword [y-4+esi*4]
	sub	esp, 8
	fstp	qword [esp]
	push	dword format
	call	_printf
	add	esp, 12
	dec	esi
	jnz	printone
	ret
	
	
	section	.data
	align	16
x	dd	10.0
	dd	100.0
	dd	400.0
	dd	653.2664
y	dd	0.0
	dd	0.0
	dd	0.0
	dd	0.0
z	dd	5.0
	dd	900.0
	dd	316.20
	dd	111.0
format	db	'%15.7f', 10, 0


Saturated Arithmetic

This program illustrates saturated addition.

asm/nasm/Win32/satexample.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38
; ----------------------------------------------------------------------------
; satexample.asm
;
; This is a short example of parallel saturated addition using paddsw.
; It takes two 64-bit quantities
;
;   80008FFF0005FEF2
;   800020E07FFE99AA
;
; and performs saturated addition on the four 16-bit blocks in parallel,
; then writes the resulting value, in hex, to standard output.  The answer
; should be
;
;   8000B0DF7FFF989C
; ----------------------------------------------------------------------------

	extern	_printf
	global	_main
	
	section	.text
_main:
	movq	mm0, [x]
	paddsw	mm0, [y]		; Do 4 saturated additions in parallel
	movq	[x], mm0

	push	dword [x]		; can't push 64 bits at once
	push	dword [x+4]		; nor does printf handle 64-bit ints
	push	dword format
	call	_printf
	add	esp, 12
	ret
	
	section	.data
x	dw	0fef2h, 0005h, 8fffh, 8000h
y	dw	099aah, 7ffeh, 20e0h, 8000h
format	db	'%0x%0x', 10, 0

Graphics

You probably the OpenGL graphics library already on your system, so why not call it from an assembly language program:

asm/nasm/Win32/triangle.asm
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85
; ----------------------------------------------------------------------------
; triangle.asm
;
; A very simple *Windows* OpenGL application using the GLUT library.  It 
; draws a nicely colored triangle in a top-level application window.  One
; interesting thing is that the Windows GL and GLUT functions do NOT use the
; C calling convention; instead they use the "stdcall" convention which is
; like C except that the callee pops the parameters.
; ----------------------------------------------------------------------------

	global	_main
	extern	_glClear@4
	extern	_glBegin@4
	extern	_glEnd@0
	extern	_glColor3f@12
	extern	_glVertex3f@12
	extern	_glFlush@0
	extern	_glutInit@8
	extern	_glutInitDisplayMode@4
	extern	_glutInitWindowPosition@8
	extern	_glutInitWindowSize@8
	extern	_glutCreateWindow@4
	extern	_glutDisplayFunc@4
	extern	_glutMainLoop@0
	
	section	.text
title:	db	'A Simple Triangle', 0
zero:	dd	0.0
one:	dd	1.0
half:	dd	0.5
neghalf:dd	-0.5

display:
	push	dword 16384
	call	_glClear@4		; glClear(GL_COLOR_BUFFER_BIT)
	push	dword 9
	call	_glBegin@4		; glBegin(GL_POLYGON)
	push	dword 0
	push	dword 0
	push	dword [one]
	call	_glColor3f@12		; glColor3f(1, 0, 0)
	push	dword 0
	push	dword [neghalf]
	push	dword [neghalf]
	call	_glVertex3f@12		; glVertex(-.5, -.5, 0)
	push	dword 0
	push    dword [one]
	push	dword 0
	call	_glColor3f@12		; glColor3f(0, 1, 0)
	push	dword 0
	push	dword [neghalf]
	push	dword [half]
	call	_glVertex3f@12		; glVertex(.5, -.5, 0)
	push	dword [one]
	push	dword 0
	push	dword 0
	call	_glColor3f@12		; glColor3f(0, 0, 1)
	push	dword 0
	push	dword [half]
	push	dword 0
	call	_glVertex3f@12		; glVertex(0, .5, 0)
	call	_glEnd@0		; glEnd()
	call	_glFlush@0		; glFlush()
	ret

_main:
	push	dword [esp+8]		; push argv
	lea	eax, [esp+8]		; get addr of argc (offset changed :-)
	push	eax
	call	_glutInit@8		; glutInit(&argc, argv)
	push	dword 0
	call	_glutInitDisplayMode@4
	push	dword 80
	push	dword 80
	call	_glutInitWindowPosition@8
	push	dword 300
	push	dword 400
	call	_glutInitWindowSize@8
	push	dword title
	call	_glutCreateWindow@4
	push	dword display
	call	_glutDisplayFunc@4
	call	_glutMainLoop@0
	ret
	

Local Variables

After entering a function, we can reserve space for local variables by decrementing the stack pointer. For example, the C function

int example(int x, int y) {
  int a, b, c;
  b = 7;
  return x * b + y;
}

can be translated as follows:

_example:
	sub	esp, 12			; make room for 3 ints
	mov	dword [esp+4], 7	; b = 7
	mov	eax, [esp+16]		; x
	imul	eax, [esp+4]	        ; x * b
	add	eax, [esp+20]		; x * b + y
	ret

After "sub esp, 12" the stack looks like:

                +---------+
         esp    |    a    |
                +---------+
         esp+4  |    b    |
                +---------+
         esp+8  |    c    |
                +---------+
         esp+12 | retaddr |
                +---------+
         esp+16 |    x    |
                +---------+
         esp+20 |    y    |
                +---------+

Stack Frames

Sometimes it is a real pain to try to keep track of the offsets of your parameters and local variables because the stack pointer keeps changing. For example, in

int example(int x, int y) {
  int a, b, c;
  ...
  f(y, a, b, b, x);
  ...
}

you cannot translate the function call as

	push	dword [esp+16]
	push	dword [esp+4]	; WRONG! b is really now at [esp+8]
	push	dword [esp+4]	; WRONG! b is really now at [esp+12]
	push	dword [esp]	; WRONG! a is really now at [esp+12]
	push	dword [esp+20]	; WRONG! y is really now at [esp+36]
	call	f

For this reason, many functions use the ebp register to index the "stack frame" of local variables and parameters, like this:

	push	ebp			; must save old ebp
	mov	ebp, esp		; point ebp to this frame
	sub	esp, ___		; make space for locals
	...
	mov	esp, ebp		; clean up locals
	pop	ebp			; restore old ebp
	ret

As long as you never change ebp throughout the function, all your local variables and parameters will always be at the same offset from ebp. The stack frame for our example function is now:

                +---------+
         ebp-12 |    a    |
                +---------+
         ebp-8  |    b    |
                +---------+
         ebp-4  |    c    |
                +---------+
         ebp    | old ebp |
                +---------+
         ebp+4  | retaddr |
                +---------+
         ebp+8  |    x    |
                +---------+
         ebp+12 |    y    |
                +---------+